Search | WHO COVID-19 Research Database

1.

Unsupervised outlier detection applied to SARS-CoV-2 nucleotide sequences can identify sequences of common variants and other variants of interest (preprint)

Georg Hahn; Sanghun Lee; Dmitry Prokopenko; Jonathan Abraham; Surender Khurana; Christoph Lange.

biorxiv; 2022.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2022.05.16.492178

ABSTRACT

As of February 2022, the GISAID database contains more than one million SARS-CoV-2 genomes, including several thousand nucleotide sequences for the most common variants such as delta or omicron. These SARS-CoV-2 strains have been collected from patients around the world since the beginning of the pandemic. We start by assessing the similarity of all pairs of nucleotide sequences using the Jaccard index and principal component analysis. As shown previously in the literature, an unsupervised cluster analysis applied to the SARS-CoV-2 genomes results in clusters of sequences according to certain characteristics such as their strain or their clade. Importantly, we observe that nucleotide sequences of common variants are often outliers in clusters of sequences stemming from variants identified earlier on during the pandemic. Motivated by this finding, we are interested in applying outlier detection to nucleotide sequences. We demonstrate that nucleotide sequences of common variants (such as alpha, delta, or omicron) can be identified solely based on a statistical outlier criterion. We argue that outlier detection might be a useful surveillance tool to identify emerging variants in real time as the pandemic progresses.

2.

Unsupervised genome-wide cluster analysis: nucleotide sequences of the omicron variant of SARS-CoV-2 are similar to sequences from early 2020 (preprint)

Georg Hahn; Sanghun Lee; Dmitry Prokopenko; Surender Khurana; Scott T. Weiss; Christoph Lange.

biorxiv; 2021.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2021.12.29.474469

ABSTRACT

The GISAID database contains more than 100,000 SARS-CoV-2 genomes, including sequences of the recently discovered SARS-CoV-2 omicron variant and of prior SARS-CoV-2 strains that have been collected from patients around the world since the beginning of the pandemic. We applied unsupervised cluster analysis to the SARS-CoV-2 genomes, assessing their similarity at a genome-wide level based on the Jaccard index and principal component analysis. Our analysis results show that the omicron variant sequences are most similar to sequences that have been submitted early in the pandemic around January 2020. Furthermore, the omicron variants in GISAID are spread across the entire range of the first principal component, suggesting that the strain has been in circulation for some time. This observation supports a long-term infection hypothesis as the omicron strain origin.

3.

New susceptibility loci for severe COVID-19 by detailed GWAS analysis in European populations (preprint)

Frauke Degenhardt; David Ellinghaus; Simonas Juzenas; Jon Lerga-Jaso; Mareike Wendorff; Douglas Maya-Miles; Florian Uellendahl-Werth; Hesham ElAbd; Malte C. Ruehlemann; Jatin Arora; Onur oezer; Ole Bernt Lenning; Ronny Myhre; May Sissel Vadla; Eike Matthias Wacker; Lars Wienbrandt; Aaron Blandino Ortiz; Adolfo de Salazar; Adolfo Garrido Chercoles; Adriana Palom; Agustin Ruiz; Alberto Mantovani; Alberto Zanella; Aleksander Rygh Holten; Alena Mayer; Alessandra Bandera; Alessandro Cherubini; Alessandro Protti; Alessio Aghemo; Alessio Gerussi; Alexander Popov; Alfredo Ramirez; Alice Braun; Almut Nebel; Ana Barreira; Ana Lleo; Ana Teles; Anders Benjamin Kildal; Andrea Biondi; Andrea Ganna; Andrea Gori; Andreas Glueck; Andreas Lind; Anke Hinney; Anna Carreras Nolla; Anna Ludovica Fracanzani; Annalisa Cavallero; Anne Ma Dyrhol-Riise; Antonella Ruello; Antonio Julia; Antonio Muscatello; Antonio Pesenti; Antonio Voza; Ariadna Rando-Segura; Aurora Solier; Beatriz Cortes; Beatriz Mateos; Beatriz Nafria-Jimenez; Benedikt Schaefer; Bjoern Jensen; Carla Bellinghausen; Carlo Maj; Carlos Ferrando; Carmen de la Horrra; Carmen Quereda; Carsten Skurk; Charlotte Thibeault; Chiara Scollo; Christian Herr; Christoph D. Spinner; Christoph Lange; Cinzia Hu; Clara Lehmann; Claudio Cappadona; Clinton Azuure; - COVICAT study group; - Covid-19 Aachen Study (COVAS); Cristiana Bianco; Cristina Sancho; Dag Arne Lihaug Hoff; Daniela Galimberti; Daniele Prati; David Haschka; David Jimenez; David Pestana; David Toapanta; Elena Azzolini; Elio Scarpini; Elisa T. Helbig; Eloisa Urrechaga; Elvezia Maria Paraboschi; Emanuele Pontali; Enric Reverter; Enrique J. Calderon; Enrique Navas; Erik Solligard; Ernesto Contro; Eunate Arana; Federico Garcia; Felix Garcia Sanchez; Ferruccio Ceriotti; Filippo Martinelli-Boneschi; Flora Peyvandi; Florian Kurth; Francesco Blasi; Francesco Malvestiti; Francisco J. Medrano; Francisco Mesonero; Francisco Rodriguez-Frias; Frank Hanses; Fredrik Mueller; Giacomo Bellani; Giacomo Grasselli; Gianni Pezzoli; Giorgio Costantino; Giovanni Albano; Giuseppe Bellelli; Giuseppe Citerio; Giuseppe Foti; Giuseppe Lamorte; Holger Neb; Ilaria My; Ingo Kurth; Isabel Hernandez; Isabell Pink; Itziar de Rojas; Ivan Galvan-Femenia; Jan C. Holter; Jan Egil Egil Afset; Jan Heyckendorf; Jan Damas; Jan Kristian Rybniker; Janine Altmueller; Javier Ampuero; Jesus M. Banales; Joan Ramon Badia; Joaquin Dopazo; Jochen Schneider; Jonas Bergan; Jordi Barretina; Joern Walter; Jose Hernandez Quero; Josune Goikoetxea; Juan Delgado; Juan M. Guerrero; Julia Fazaal; Julia Kraft; Julia Schroeder; Kari Risnes; Karina Banasik; Karl Erik Mueller; Karoline I. Gaede; Koldo Garcia-Etxebarria; Kristian Tonby; Lars Heggelund; Laura Izquierdo-Sanchez; Laura Rachele Bettini; Lauro Sumoy; Leif Erik Sander; Lena J. Lippert; Leonardo Terranova; Lindokuhle Nkambule; Lisa Knopp; Lise Tuset Gustad; Lucia Garbarino; Luigi Santoro; Luis Tellez; Luisa Roade; Mahnoosh Ostadreza; Maider Intxausti; Manolis Kogevinas; Mar Riveiro-Barciela; Marc M. Berger; Mari E.K. Niemi; Maria A. Gutierrez-Stampa; Maria Grazia Valsecchi; Maria Hernandez-Tejero; Maria J.G.T. Vehreschild; Maria Manunta; Mariella D'Angio; Marina Cazzaniga; Marit M. Grimsrud; Markus Cornberg; Markus M. Noethen; Marta Marquie; Massimo Castoldi; Mattia Cordioli; Maurizio Cecconi; Mauro D'Amato; Max Augustin; Melissa Tomasi; Merce Boada; Michael Dreher; Michael J. Seilmaier; Michael Joannidis; Michael Wittig; Michela Mazzocco; Miguel Rodriguez-Gandia; Natale Imaz Ayo; Natalia Blay; Natalia Chueca; Nicola Montano; Nicole Ludwig; Nikolaus Marx; Nilda Martinez; - Norwegian SARS-CoV-2 Study group; Oliver A. Cornely; Oliver Witzke; Orazio Palmieri; - Pa COVID-19 Study Group; Paola Faverio; Paolo Bonfanti; Paolo Tentorio; Pedro Castro; Pedro M. Rodrigues; Pedro Pablo Espana; Per Hoffmann; Philip Rosenstiel; Philipp Schommers; Phillip Suwalski; Raul de Pablo; Ricard Ferrer; Robert Bals; Roberta Gualtierotti; Rocio Gallego-Duran; Rosa Nieto; Rossana Carpani; Ruben Morilla; Salvatore Badalamenti; Sammra Haider; Sandra Ciesek; Sandra May; Sara Bombace; Sara Marsal; Sara Pigazzini; Sebastian Klein; Selina Rolker; Serena Pelusi; Sibylle Wilfling; Silvano Bosari; Soren Brunak; Soumya Raychaudhuri; Stefan Schreiber; Stefanie Heilmann-Heimbach; Stefano Aliberti; Stephan Ripke; Susanne Dudman; - The Humanitas COVID-19 Task Forse; - The Humanitas Gavazzeni COVID-19 Task Force; Thomas Bahmer; Thomas Eggermann; Thomas Illig; Thorsten Brenner; Torsten Feldt; Trine Folseraas; Trinidad Gonzalez Cejudo; Ulf Landmesser; Ulrike Protzer; Ute Hehr; Valeria Rimoldi; Vegard Skogen; Verena Keitel; Verena Kopfnagel; Vicente Friaza; Victor Andrade; Victor Moreno; Wolfgang Poller; Xavier Farre; Xiaomin Wang; Yascha Khodamoradi; Zehra Karadeniz; Anna Latiano; Siegfried Goerg; Petra Bacher; Philipp Koehler; Florian Tran; Heinz Zoller; Eva C. Schulte; Bettina Heidecker; Kerstin U. Ludwig; Javier Fernandez; Manuel Romero-Gomez; Agustin Albillos; Pietro Invernizzi; Maria Buti; Stefano Duga; Luis Bujanda; Johannes R. Hov; Tobias L. Lenz; Rosanna Asselta; Rafael de Cid; Luca Valenti; Tom H. Karlsen; Mario Caceres; Andre Franke.

medrxiv; 2021.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.07.21.21260624

ABSTRACT

Due to the highly variable clinical phenotype of Coronavirus disease 2019 (COVID-19), deepening the host genetic contribution to severe COVID-19 may further improve our understanding about underlying disease mechanisms. Here, we describe an extended GWAS meta-analysis of 3,260 COVID-19 patients with respiratory failure and 12,483 population controls from Italy, Spain, Norway and Germany, as well as hypothesis-driven targeted analysis of the human leukocyte antigen (HLA) region and chromosome Y haplotypes. We include detailed stratified analyses based on age, sex and disease severity. In addition to already established risk loci, our data identify and replicate two genome-wide significant loci at 17q21.31 and 19q13.33 associated with severe COVID-19 with respiratory failure. These associations implicate a highly pleiotropic ~0.9-Mb 17q21.31 inversion polymorphism, which affects lung function and immune and blood cell counts, and the NAPSA gene, involved in lung surfactant protein production, in COVID-19 pathogenesis.

Subject(s)

COVID-19 , Respiratory Insufficiency

4.

Mutations in SARS-CoV-2 spike protein and RNA polymerase complex are associated with COVID-19 mortality risk (preprint)

Georg Hahn; Chloe M. Wu; Sanghun Lee; Julian Hecker; Sharon M. Lutz; Sebastien Haneuse; Nan M. Laird; Katharina Ribbeck; Christoph Lange; Ivet Bahar; Jinan Suliman; Elias Tayar; Hasan Ali Kasem; Meynard J. A. Agsalog; Bassam K. Akkarathodiyil; Ayat A. Alkhalaf; Mohamed Morhaf M. H. Alakshar; Abdulsalam Ali A. H. Al-Qahtani; Monther H. A. Al-Shedifat; Anas Ansari; Ahmad Ali Ataalla; Sandeep Chougule; Abhilash K. K. V. Gopinathan; Feroz J. Poolakundan; Sanjay U. Ranbhise; Saed M. A. Saefan; Mohamed M. Thaivalappil; Abubacker S. Thoyalil; Inayath M. Umar; Zaina Al Kanaani; Abdullatif Al Khal; Einas Al Kuwari; Adeel A. Butt; Peter Coyle; Andrew Jeremijenko; Anvar Hassan Kaleeckal; Ali Nizar Latif; Riyazuddin Mohammad Shaik; Hanan F. Abdul Rahim; Hadi M. Yassine; Gheyath K. Nasrallah; Mohamed G. Al Kuwari; Odette Chaghoury; Hiam Chemaitelly; Laith J Abu-Raddad.

biorxiv; 2020.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.11.17.386714

ABSTRACT

SARS-CoV-2 mortality has been extensively studied in relationship to a patient's predisposition to the disease. However, how sequence variations in the SARS-CoV-2 genome affect mortality is not understood. To address this issue, we used a whole-genome sequencing (WGS) association study to directly link death of SARS-CoV-2 patients with sequence variation in the viral genome. Specifically, we analyzed 3,626 single stranded RNA-genomes of SARS-CoV-2 patients in the GISAID database (Elbe and Buckland-Merrett, 2017; Shu and McCauley, 2017) with reported patient's health status from COVID-19, i.e. deceased versus non-deceased. In total, evaluating 28,492 loci of the viral genome for association with patient/host mortality, two loci, 12,053bp and 25,088bp, achieved genome-wide significance (p-values of 1.24e-12, and 1.24e-26, respectively). Mutations at 25,088bp occur in the S2 subunit of the SARS-CoV-2 spike protein, which plays a key role in viral entry of target host cells. Additionally, mutations at 12,053bp are within the ORF1ab gene, in a region encoding for the protein nsp7, which is necessary to form the RNA polymerase complex responsible for viral replication and transcription. Both mutations altered amino acid coding sequences, potentially imposing structural changes that could enhance viral infectivity and symptom severity, and may be important to consider as targets for therapeutic development.

Subject(s)

Genomic Instability , COVID-19

5.

Mutations in SARS-CoV-2 spike protein and RNA polymerase complex are associated with COVID-19 mortality risk (preprint)

Christoph Lange; Georg Hahn; Chloe Wu; Sanghun Lee; Julian Hecker; Sharon Lutz; Sebastien Haneuse; Dandi Qiao; Michael Cho; Adrienne Randolph; Nan Laird; Scott Weiss; Edwin Silverman; Katharina Ribbeck.

researchsquare; 2020.

Preprint in English | PREPRINT-RESEARCHSQUARE | ID: ppzbmed-10.21203.rs.3.rs-95183.v1

ABSTRACT

SARS-CoV-2 mortality has been extensively studied in relationship to a patient's predisposition to the disease. However, how sequence variations in the SARS-CoV-2 genome affect mortality is not understood. To address this issue, we used a whole-genome sequencing (WGS) association study to directly link death of SARS-CoV-2 patients with sequence variation in the viral genome. Specifically, we analyzed 3,626 single stranded RNA-genomes of SARS-CoV-2 patients in the GISAID database (Elbe and Buckland-Merrett, 2017; Shu and McCauley, 2017) with reported patient’s health status from COVID-19, i.e. deceased versus non-deceased. In total, evaluating 28,492 loci of the viral genome for association with patient/host mortality, two loci, 12,053bp and 25,088bp, achieved genome-wide significance (p-values of 1.24e-12, and 1.24e-26, respectively). Mutations at 25,088bp occur in the S2 subunit of the SARS-CoV-2 spike protein, which plays a key role in viral entry of target host cells. Additionally, mutations at 12,053bp are within the ORF1ab gene, in a region encoding for the protein nsp7, which is necessary to form the RNA polymerase complex responsible for viral replication and transcription. Both mutations altered amino acid coding sequences, potentially imposing structural changes that could enhance viral infectivity and symptom severity, and may be important to consider as targets for therapeutic development.

Subject(s)

Genomic Instability , COVID-19

6.

Longitudinal multi-omics analysis identifies responses of megakaryocytes, erythroid cells and plasmablasts as hallmarks of severe COVID-19 trajectories (preprint)

Joana P. Bernardes; Neha Mishra; Florian Tran; Thomas Bahmer; Lena Best; Johanna I. Blase; Dora Bordoni; Jeanette Franzenburg; Ulf Geisen; Jonathan Josephs-Spaulding; Philipp Koehler; Axel Kuenstner; Elisa Rosati; Anna C. Aschenbrenner; Petra Bacher; Nathan Baran; Teide Boysen; Burkhard Brandt; Niklas Bruse; Jonathan Doerr; Andreas Draeger; Gunnar Elke; David Ellinghaus; Julia Fischer; Michael Forster; Andre Franke; Soeren Franzenburg; Norbert Frey; Anette Friedrichs; Janina Fuss; Andreas Glueck; Jacob Hamm; Finn Hinrichsen; Marc P. Hoeppner; Simon Imm; Ralf Juenker; Sina Kaiser; Ying H. Kan; Rainer Knoll; Christoph Lange; Georg Laue; Clemes Lier; Matthias Lindner; Georgios Marinos; Robert Markewitz; Jacob Nattermann; Rainer Noth; Peter Pickkers; Klaus F. Rabe; Alina Renz; Christoph Roecken; Jan Rupp; Annika Schaffarzyk; Alexander Scheffold; Jonas Schulte-Schrepping; Domagoj Schunck; Dirk Skowasch; Thomas Ulas; Klaus-Peter Wandinger; Michael Wittig; Johannes Zimmermann; Hauke Busch; Bimba F. Hoyer; Christoph Kaleta; Jan Heyckendorf; Matthijs Kox; Jan Rybniker; Stefan Schreiber; Joachim Schultze; Philip Rosenstiel; - HCA Lung Biological Network; - Deutsche COVID-19 Omics Initiative (DeCOI).

medrxiv; 2020.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.09.11.20187369

ABSTRACT

The pandemic spread of the potentially life-threatening disease COVID-19 requires a thorough understanding of the longitudinal dynamics of host responses. Temporal resolution of cellular features associated with a severe disease trajectory will be a pre-requisite for finding disease outcome predictors. Here, we performed a longitudinal multi-omics study using a two-centre German cohort of 13 patients (from Cologne and Kiel, cohort 1). We analysed the bulk transcriptome, bulk DNA methylome, and single-cell transcriptome (>358,000 cells, including BCR profiles) of peripheral blood samples harvested from up to 5 time points. The results from single-cell and bulk transcriptome analyses were validated in two independent cohorts of COVID-19 patients from Bonn (18 patients, cohort 2) and Nijmegen (40 patients, cohort 3), respectively. We observed an increase of proliferating, activated plasmablasts in severe COVID-19, and show a distinct expression pattern related to a hyperactive cellular metabolism of these cells. We further identified a notable expansion of type I IFN-activated circulating megakaryocytes and their progenitors, indicative of emergency megakaryopoiesis, which was confirmed in cohort 2. These changes were accompanied by increased erythropoiesis in the critical phase of the disease with features of hypoxic signalling. Finally, projecting megakaryocyte- and erythroid cell-derived co-expression modules to longitudinal blood transcriptome samples from cohort 3 confirmed an association of early temporal changes of these features with fatal COVID-19 disease outcome. In sum, our longitudinal multi-omics study demonstrates distinct cellular and gene expression dynamics upon SARS-CoV-2 infection, which point to metabolic shifts of circulating immune cells, and reveals changes in megakaryocytes and increased erythropoiesis as important outcome indicators in severe COVID-19 patients.

Subject(s)

COVID-19

7.

Unsupervised cluster analysis of SARS-CoV-2 genomes indicates that recent (June 2020) cases in Beijing are from a genetic subgroup that consists of mostly European and South(east) Asian samples, of which the latter are the most recent (preprint)

Georg Hahn; Christoph Lange.

biorxiv; 2020.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.06.22.165936

ABSTRACT

Research efforts of the ongoing SARS-CoV-2 pandemic have focused on viral genome sequence analysis to understand how the virus spread across the globe. Here, we assess three recently identified SARS-CoV-2 genomes in Beijing from June 2020 and attempt to determine the origin of these genomes, made available in the GISAID database. The database contains fully or partially sequenced SARS-CoV-2 samples from laboratories around the world. Including the three new samples and excluding samples with missing annotations, we analyzed 7, 643 SARS-CoV-2 genomes. Using principal component analysis computed on a similarity matrix that compares all pairs of the SARS-CoV-2 nucleotide sequences at all loci simultaneously, using the Jaccard index, we find that the newly discovered virus genomes from Beijing are in a genetic cluster that consists mostly of cases from Europe and South(east) Asia. The sequences of the new cases are most related to virus genomes from a small number of cases from China (March 2020), cases from Europe (February to early May 2020), and cases from South(east) Asia (May to June 2020). These findings could suggest that the original cases of this genetic cluster originated from China in March 2020 and were re-introduced to China by transmissions from samples from South(east) Asia between April and June 2020.

8.

Unsupervised cluster analysis of SARS-CoV-2 genomes reflects its geographic progression and identifies distinct genetic subgroups of SARS-CoV-2 virus (preprint)

Georg Hahn; Sanghun Lee; Christoph Lange.

biorxiv; 2020.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.05.05.079061

ABSTRACT

Over 10,000 viral genome sequences of the SARS-CoV-2 virus have been made readily available during the ongoing coronavirus pandemic since the initial genome sequence of the virus was released on the open access Virological website (http://virological.org/) early on January 11. We utilize the published data on the single stranded RNAs of 11, 132 SARS-CoV-2 patients in the GISAID (Elbe and Buckland-Merrett, 2017; Shu and McCauley, 2017) database, which contains fully or partially sequenced SARS-CoV-2 samples from laboratories around the world. Among many important research questions which are currently being investigated, one aspect pertains to the genetic characterization/classification of the virus. We analyze data on the nucleotide sequencing of the virus and geographic information of a subset of 7, 640 SARS-CoV-2 patients without missing entries that are available in the GISAID database. Instead of modelling the mutation rate, applying phylogenetic tree approaches, etc., we here utilize a model-free clustering approach that compares the viruses at a genome-wide level. We apply principal component analysis to a similarity matrix that compares all pairs of these SARS-CoV-2 nucleotide sequences at all loci simultaneously, using the Jaccard index (Jaccard, 1901; Tan et al., 2005; Prokopenko et al., 2016; Schlauch et al., 2017). Our analysis results of the SARS-CoV-2 genome data illustrates the geographic and chronological progression of the virus, starting from the first cases that were observed in China to the current wave of cases in Europe and North America. This is in line with a phylogenetic analysis which we use to contrast our results. We also observe that, based on their sequence data, the SARS-CoV-2 viruses cluster in distinct genetic subgroups. It is the subject of ongoing research to examine whether the genetic subgroup could be related to diseases outcome and its potential implications for vaccine development.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL